Skip to main content

Logistic/Probit regression

linear probability model#

E(yx)=xβE(y|x)=x'\beta

Probit & Logit#

P(y=1x)=F(xβ)P(y=0x)=1F(xβ)\begin{aligned} &P(y=1|x)=F(x'\beta)\\ &P(y=0|x)=1- F(x'\beta) \end{aligned}
  • F(x)=Φ(x)F(x)=\Phi(x) Probit for endogeneity.
  • F(x)=Logit(x)F(x)=Logit(x) Logit for computing simplicity.

Logistic regression#

Logistic distribution CDF is in the form of a logistic function. F(x)=11+e(xμ)/sF(x)=\frac {1}{1+e^{-(x-\mu )/s}} Logistic regression assumption:

P(Y=1X)=11+exp(w0+i=1nwiXi)P(Y=1 \mid X)=\frac{1}{1+\exp \left(w_{0}+\sum_{i=1}^{n} w_{i} X_{i}\right)}

latent-score-interpretation

y=1iff y=xw+ϵ>0.y=1 \quad \text{iff } y^{*}=x'w+\epsilon >0. ϵLogit\epsilon \sim Logit.

P(y=1x)=P(y>0x)=P(ϵ<xwx)=Fϵ(xw)\begin{aligned} P(y=1|x)&=P(y^* >0|x)\\ &=P(\epsilon <x'w |x)\\ &=F_{\epsilon}(x'w) \end{aligned}

marginal effect#

Odd=P(y=1x)P(y=0x)=exw\text{Odd}=\frac{P(y=1|x)}{P(y=0|x)}=e^{x'w}

w represents the marginal effect of x on the log of the odd.

MLE#

W^=argmaxWllnP(YlXl,W)\hat{W}=\underset{W}{\arg \max } \sum_{l} \ln P\left(Y^{l} \mid X^{l}, W\right)

log-loss function#

From the view of machine learning the mle object function could be represented by log-loss cost function with further assumption on penalty terms.

W^=argminWCllnP(YlXl,W)+W=argminWClYlln(P(Yl))+(1Yl)ln(1P(Yl))+W\begin{aligned} \hat{W} &= \underset{W}{\arg \min } - C\sum_{l} \ln P\left(Y^{l} \mid X^{l}, W\right) + \|W\| \\ & = \underset{W}{\arg \min } -C\sum_{l} Y^l ln(P(Y^l)) + (1-Y^l) ln(1-P(Y^l))+ \|W\| \end{aligned}

Probit with endogeneous#

from the latent score interpretation, y1=1y1>0y_1 = 1_{y_1^* >0} structure model (u,v)(u,v) are bivariate normal, var(u)=1var(u)=1,

y1=z1δ1+α1y2+u;y2=z2δ2+v;u=θv+e\begin{aligned} &y_1^* = z_1'\delta_1 +\alpha_1 y_2 +u;\\ &y_2=z_2' \delta_2 +v;\\ & u=\theta v+ e \end{aligned}

y2y_2 correlated with u via v.

We have

var(e)=1cov(v,u)2var(v)var(u)=1ρ2var(e)=1-\frac{cov(v,u)^2}{var(v)var(u)}=1-\rho^2

By GLS e~N(0,1)\tilde{e}\sim N(0,1);

y1=z1δ1+y2α1+vθ+ey1/(1ρ2)=z1δ1/(1ρ2)+y2α1/(1ρ2)+vθ/(1ρ2)+e/(1ρ2)y~1=z1δ~1+y2α~1+vθ~+e~\begin{aligned} &y_1^* = z_1'\delta_1 +y_2\alpha_1 + v \theta+ e\\ &y_1^*/(\sqrt{1-\rho^2}) = z_1'\delta_1/(\sqrt{1-\rho^2}) +y_2\alpha_1/(\sqrt{1-\rho^2}) + v \theta/(\sqrt{1-\rho^2})+ e/(\sqrt{1-\rho^2})\\ &\tilde{y}_1^* = z_1'\tilde{\delta}_1 +y_2\tilde{\alpha}_1 + v\tilde{\theta}+ \tilde{e}\\ \end{aligned}

Then the probit model reduced to:

P(y=1z1,y2,v)=P(y~>0z1,y2,v)=Φ(z1δ~1+α~y2+θ~v)P(y=1|z_1,y_2,v)=P(\tilde{y}^*>0|z_1,y_2,v)=\Phi(z_1'\tilde{\delta}_1+\tilde{\alpha}y_2+\tilde{\theta}v) 2-step

  1. estimate vv
  2. run probit using v^\hat{v}

Probit with fixed effects#

yit=xitβ+αi+εy_{it}^{*} =x_{it}'\beta+\alpha_i+\varepsilon